Evaluating Human-Machine Conversation for Appropriateness
نویسندگان
چکیده
Evaluation of complex, collaborative dialogue systems is a difficult task. Traditionally, developers have relied upon subjective feedback from the user, and parametrisation over observable metrics. However, both models place some reliance on the notion of a task; that is, the system is helping to user achieve some clearly defined goal, such as book a flight or complete a banking transaction. It is not clear that such metrics are as useful when dealing with a system that has a more complex task, or even no definable task at all, beyond maintain and performing a collaborative dialogue. Working within the EU funded COMPANIONS program, we investigate the use of appropriateness as a measure of conversation quality, the hypothesis being that good companions need to be good conversational partners . We report initial work in the direction of annotating dialogue for indicators of good conversation, including the annotation and comparison of the output of two generations of the same dialogue system.
منابع مشابه
Institute of Informatics Logics and Security Studies Evaluating Human-Machine Conversation for Appropriateness
متن کامل
Evaluating Human-Computer Conversation in Companions
We report on the first evaluation of the Companions project prototypes. We give preliminary results from our phase one evaluation, using known and well-understood dialogue metrics. We also give a first indication of the directions we plan to take to evaluate increasingly sophisticated conversational systems, using measures of coherence and appropriateness. 12
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملThe Appropriateness of Educational Programs' Objectives for Professional Needs: The Viewpoints of Khorramabad School of Nursing and Midwifery Graduates
Introduction: Evaluating the educational programs from the viewpoints of graduates may identify the weaknesses of such programs and provide the opportunity for their improvement. This study was performed to determine the appropriateness of educational programs for professional needs from the viewpoints of graduates of Khorramabad School of Nursing and Midwifery. Methods: This descriptive cros...
متن کاملA New Statistical Model for Evaluation Interactive Question Answering Systems Using Regression
The development of computer systems and extensive use of information technology in the everyday life of people have just made it more and more important for them to make quick access to information that has received great importance. Increasing the volume of information makes it difficult to manage or control. Thus, some instruments need to be provided to use this information. The QA system is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010